Streamlined Toolkit for Real-time Exploratory Analysis of Multiomics
Squares represent transcription factors, circles are target genes. Node colour reflects log fold change (red = higher in the test group, blue = higher in the reference group). Green edges indicate positive influence; red edges indicate negative influence. Squares represent transcription factors, circles are target genes. Node colour reflects log fold change using a purple–orange scale (purple = higher in the test group, orange = higher in the reference group). Teal edges indicate positive influence; orange edges indicate negative influence. Hovering highlights neighbours and clicking a node updates the expression box plot on the right.
AnnData object with X_umap coordinates, a clustering column ("new_leiden_sanitized"), QC metrics, and optional nearest-neighbour connectivities. When “CITE-seq (MuData)” is enabled the RNA modality is extracted from MuData and paired with an antibody/protein modality for ADT plots.adata.obs are collected with their palette stored in adata.uns[<cat>_colors]. QC arrays (n_counts, n_genes, mitochondrial/ribosomal percentages, doublet scores) plus optional batch and cell_cycle labels are exported for the QC tab. If a connectivities matrix is present, its CSR indices are sent so the client can recompute cluster neighbourhoods.adata.obs[celltype_var]) we load precomputed edgeR likelihood-ratio results from ./STREAM/Pseudobulk/<cell>_{contrast_var}_edgeR_results.csv. Column names are normalised (gene_symbol, LR, FDR, log2FoldChange/logFC).FDR < 0.05 define the marker union used to seed volcano plots, DE tables, and (optionally) the subset of genes delivered to the browser. Separate up/down lists (top 50 by fold-change sign) are cached for AI prompts._edgeR_results.csv) together with metadata (_metadata.csv) back the box plots and GSVA panels. The first metadata column names the sample ID; the column Group_var (set elsewhere in the notebook) specifies the experimental grouping shown in the UI.mt.ulm is applied to edgeR LR statistics against the curated BTM_for_GSEA_20131008_fixed.gmt network. Terms with padj < 0.05 are prioritised by adjusted p-value (tie-breaking on activity) and the top 25 per cell type are stored (top 5 power the “BTM markers” chips). Sample-level GSVA scores are recomputed from the TMM-normalised expression matrix embedded in _edgeR_results.csv and tested versus Group_var using an ANOVA/Spearman helper (global_test), populating the box plots and summary stats.dc.op.hallmark(organism="human"), keeping pathways with significant activity. GSVA-derived distributions and FDR-adjusted global tests are exposed alongside the enrichment table.dc.op.collectri as the regulatory network. ULM scores are filtered at padj < 0.05, and the resulting TF network is decorated with per-target log fold-changes. Additional GSVA analyses reuse the TMM-normalised expression matrix to deliver per-group activity distributions for the “TF activity” tab.
Enabling “Generate biology insights” triggers LLM-backed summaries per cell type. The model receives only aggregate artefacts (cluster sizes, edgeR up/down lists, top BTM/Hallmark/TF hits and their scores) together with study context from PROJECT_TITLE/PROJECT_CONTEXT/CONTRAST_VAR. Outputs are harmonised into a strict Markdown template with canonical/fine labels, confidence, and key markers, and an overview prompt synthesises tissue-wide takeaways. Raw counts or per-cell matrices never leave the session.
Group_var; Statistics — edgeR LRT against the active contrast with Benjamini-Hochberg FDR, box plots display log-transformed edgeR-normalised counts; AI — top up/down lists here are passed verbatim to the LLM prompts.adata.obs field; Statistics — descriptive distributions built from the same quantised expression vectors, no new inference; AI — baseline cell fractions (and optional deltas if provided during build) are forwarded as baseline_fraction/delta_fraction in the prompts.global_test (ANOVA for categorical Group_var, Spearman otherwise); AI — the ranked modules and q-values are embedded in each cell-type prompt.global_test workflow as BTM; AI — significant pathways and FDR values are injected into the prompts.collectri ULM scores with BH-adjusted padj plus GSVA/global_test; AI — the top TFs and q-values populate the regulator section of the AI payload.global_test pipeline applied to the user-supplied GMT; AI — significant terms are appended to the custom section of each prompt.ROLE
You are “GeneSage,” a PhD-level molecular biologist and single-cell bioinformatics specialist.
OBJECTIVE
Predict the most likely cell type for cluster {cluster} in dataset {dataset_name} using ONLY the inputs provided.
INPUTS
Cluster size: {size}
{top_marker_lines}
Additional marker genes per method (JSON): {cl_markers}
BTM modules (JSON): {cl_btm_json}
Hallmark pathways (JSON): {cl_hall_json}
Tissue context: {tissue_context}
OUTPUT
Return a concise markdown paragraph with the canonical label, an optional fine
label, confidence (High/Medium/Low), and key markers supporting the call.
QC and composition tabs reflect the layer chosen at run time; when multiple layers are available the same dashboard can be regenerated for alternative normalisations.
Interactive data mining and analysis of bulk and single-cell expression data — delivered as a fast, shareable HTML dashboard with UMAPs, marker discovery, and enrichment for BTM modules and Hallmark pathways.
STREAM is a zero-install workflow that transforms differential-expression analyses from bulk, single-cell and pseudobulk RNA-seq experiments into an interactive HTML dashboard. It is designed to make exploratory analysis reproducible and accessible by automating quality control (QC), clustering, marker discovery, enrichment analysis and multi-scale summarization. Instead of requiring collaborators to rerun notebooks, STREAM bundles all analyses, including interactive plots, statistics and provenance, into a single offline-capable web page.
Single Cell MultiOmics Lab
If you use STREAM in your work, please cite: STREAM (v2.0) — Streamlined Toolkit for Real-time Exploratory Analysis of Multiomics, generated 2025-10-21T11:47:11.
Disclaimer. STREAM is a research and education tool. It is not intended for clinical, diagnostic, or patient-management decisions.
MIT License Copyright (c) 2025 Single Cell MultiOmics Lab / STREAM Authors Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.